Real-Time Voice Processing SDK (VoiceControl)

This SDK enables real-time voice interaction with your app using OpenAI's Realtime API and WebRTC.

It consists of three main components:

Express Server (server.ts) – Hosts API and manages the OpenAI session.
WebRTC Integration (VoiceControl.ts) – Captures mic input, streams audio, and executes functions dynamically.
Your Frontend App(for example. App.jsx) - Registers app functions and handles UI updates based on voice events.

Key Functions of VoiceControl.ts

Function	Description
`setupPeerConnection()`	Initializes WebRTC, sends audio to OpenAI, and listens for transcription & functions.
`addFunction()`	Adds a function and its metadata to the SDK.
`handleFunctionCall(event)`	Executes the function requested by OpenAI.
`registerTranscriptionHandler()`	Attaches a live handler to real-time transcriptions.
`handleTranscription(text)`	Logs or handles spoken input from the user.
`stopRecording()`	Stops the microphone and tears down the connection.

WebRTC Data Flow

User Speaks → Audio Captured → WebRTC Sends to OpenAI → OpenAI Transcribes → Function Executes

Frontend

Role of Your Frontend App

Registers available functions so OpenAI can call them.
Sends voice commands to OpenAI via WebRTC.
Updates the UI based on real-time responses.

How It Works

Register Functions → The frontend registers app-specific functions (addTask, deleteTask, etc.).
Start WebRTC Connection → Calls setupPeerConnection() when the mic is activated.
Handle Responses:

When OpenAI sends a function call, the app executes it.
If OpenAI sends a transcription, the UI can display it.

Key Functions in React

Function	Description
`voiceControl.setupPeerConnection()`	Establishes the WebRTC session.
`voiceControl.addFunction()`	Registers UI actions for OpenAI to use.
`voiceControl.registerTranscriptionHandler()`	Receives and processes transcribed text.

Full Workflow

User clicks "Start Recording" → setupPeerConnection() initializes WebRTC.
User speaks → Audio is streamed to OpenAI.
OpenAI transcribes & detects intent:
Frontend executes the function (e.g., marking a task complete).
UI updates in real time.

How to Use

1. Start the Server

npm run build
node server.js

2. Start the React Frontend

npm run dev

3. Register Functions

In your frontend app, register functions that OpenAI should recognize:

voiceControl.addFunction(addTaskFunctionData, addTask);
voiceControl.addFunction(deleteCustomerFunctionData, deleteCustomerByName);

4. Start Recording

Click "Start Recording" to enable voiceControl.setupPeerConnection().

Example Use Cases

Command	Expected Outcome
"Add a task called 'Buy milk'"	`addTask("Buy milk")` is executed.
"Mark 'Eat' as complete"	`toggleTaskCompletedByName("Eat")` is called.
"Delete 'Sleep' task"	`deleteTaskByName("Sleep")` removes it from the UI.

Key Functions of VoiceControl.ts​

WebRTC Data Flow​

Frontend​

Role of Your Frontend App​

How It Works​

Key Functions in React​

Full Workflow​

How to Use​

1. Start the Server​

2. Start the React Frontend​

3. Register Functions​

4. Start Recording​

Example Use Cases​